Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications
نویسندگان
چکیده
As Deep Learning (DL) models grow larger and more complex, training jobs are increasingly distributed across multiple Computing Units (CU) such as GPUs TPUs. Each CU processes a sub-part of the model synchronizes results with others. Communication among these CUs has emerged key bottleneck in process. In this work, we present SiPAC, Silicon Photonic Accelerated Compute cluster. SiPAC accelerates DL by means two co-designed components: photonic physical layer novel collective algorithm. The exploits embedded photonics to bring peta-scale I/O directly optimized cluster uses resonator-based optical wavelength selectivity realize hardware multi-casting. algorithm builds on multi-casting primitive. This combination expedites variety communications commonly employed potential drastically ease communication bottlenecks. We demonstrate feasibility realizing architecture through 1) an testbed experiment where array comb laser wavelengths shuffled cascaded ring switch, each selecting forwarding increase effective bandwidth hence demonstrating multicasting primitive, 2) four-GPU running realistic workload that achieves 22% system-level performance improvement relative similarly-sized leaf-spine topology. Large scale simulations show 1.4× 5.9× time reduction compared state-of-the-art compute clusters for representative communications.
منابع مشابه
Architecture and Applications for a Distributed Embedded Firewall
The distributed firewall is an important new line of network defense. It provides fine-grained access control to augment the protections afforded by the traditional perimeter firewall. To be effective, though, a distributed firewall must satisfy two critical requirements. First, it must embrace a protection model that acknowledges that everything behind the firewall may not be trustworthy. The ...
متن کاملHow to scale distributed deep learning?
Training time on large datasets for deep neural networks is the principal workflow bottleneck in a number of important applications of deep learning, such as object classification and detection in automatic driver assistance systems (ADAS). To minimize training time, the training of a deep neural network must be scaled beyond a single machine to as many machines as possible by distributing the ...
متن کاملThe Willow Architecture: Comprehensive Survivability for Large-Scale Distributed Applications
The Willow architecture is a comprehensive approach to survivability in critical distributed applications. Survivability is achieved in a deployed system using a unique combination of (a) fault avoidance by disabling vulnerable network elements intentionally when a threat is detected or predicted, (b) fault elimination by replacing system software elements when faults are discovered, and (c) fa...
متن کاملPeta-Scale Computing
In a few short years, computers capable of over one Petaflops performance will become a reality. The most likely approach for first successfully reaching this performance level will involve several thousands of parallel processing elements. What are the key considerations for building such systems? What are the software requirements and demands? How will applications scale? How reliable are the...
متن کاملPerformance-Optimum Superscalar Architecture for Embedded Applications
Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore, architectures that provide the required performance are the most desirable. On the other hand, processor performance is severely related to the average memory acc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Lightwave Technology
سال: 2023
ISSN: ['0733-8724', '1558-2213']
DOI: https://doi.org/10.1109/jlt.2023.3276588